{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "qnyTxjK_GbOD"
   },
   "source": [
    "# Ungraded Lab: Beyond Hello World, A Computer Vision Example\n",
    "In the previous exercise, you saw how to create a neural network that figured out the problem you were trying to solve. This gave an explicit example of learned behavior. Of course, in that instance, it was a bit of an overkill because it would have been easier to write the function `y=2x-1` directly instead of bothering with using machine learning to learn the relationship between `x` and `y`.\n",
    "\n",
    "But what about a scenario where writing rules like that is much more difficult -- for example a computer vision problem? Let's take a look at a scenario where you will build a neural network to recognize different items of clothing, trained from a dataset containing 10 different types."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "H41FYgtlHPjW"
   },
   "source": [
    "## Start Coding\n",
    "\n",
    "You can start with importing the libraries you will need throughout this notebook."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "q3KzJyjv3rnA"
   },
   "outputs": [],
   "source": [
    "import tensorflow as tf\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "n_n1U5do3u_F"
   },
   "source": [
    "The [Fashion MNIST dataset](https://github.com/zalandoresearch/fashion-mnist) is a collection of grayscale 28x28 pixel clothing images. Each image is associated with a label as shown in this table.\n",
    "\n",
    "| Label | Description |\n",
    "| --- | --- |\n",
    "| 0 | T-shirt/top |\n",
    "| 1 | Trouser |\n",
    "| 2 | Pullover |\n",
    "| 3 | Dress |\n",
    "| 4 | Coat |\n",
    "| 5 | Sandal |\n",
    "| 6 | Shirt |\n",
    "| 7 | Sneaker |\n",
    "| 8 | Bag |\n",
    "| 9 | Ankle boot |\n",
    "\n",
    "This dataset is available directly in the [tf.keras.datasets](https://www.tensorflow.org/api_docs/python/tf/keras/datasets) API and you load it like this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "PmxkHFpt31bM"
   },
   "outputs": [],
   "source": [
    "# Load the Fashion MNIST dataset\n",
    "fmnist = tf.keras.datasets.fashion_mnist"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "GuoLQQBT4E-_"
   },
   "source": [
    "Calling `load_data()` on this object will give you two tuples with two lists each. These will be the training and testing values for the graphics that contain the clothing items and their labels.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "BTdRgExe4TRB"
   },
   "outputs": [],
   "source": [
    "# Load the training and test split of the Fashion MNIST dataset\n",
    "(training_images, training_labels), (test_images, test_labels) = fmnist.load_data()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "id": "rw395ROx4f5Q"
   },
   "source": [
    "What do these values look like? You can print a training image (both as an image and a numpy array), and a training label to see. Experiment with different indices in the array. For example, also take a look at index `42`. That's a different boot than the one at index `0`.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "FPc9d3gJ3jWF"
   },
   "outputs": [],
   "source": [
    "# You can put between 0 to 59999 here\n",
    "index = 0\n",
    "\n",
    "# Set number of characters per row when printing\n",
    "np.set_printoptions(linewidth=320)\n",
    "\n",
    "# Print the label and image\n",
    "print(f'LABEL: {training_labels[index]}')\n",
    "print(f'\\nIMAGE PIXEL ARRAY:\\n\\n{training_images[index]}\\n\\n')\n",
    "\n",
    "# Visualize the image using the default colormap (viridis)\n",
    "plt.imshow(training_images[index])\n",
    "plt.colorbar()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "3cbrdH225_nH"
   },
   "source": [
    "You'll notice that all of the values in the number are between 0 and 255. If you are training a neural network especially in image processing, for various reasons it will usually learn better if you scale all values to between 0 and 1. It's a process called _normalization_ and fortunately in Python, it's easy to normalize an array without looping. You do it like this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "kRH19pWs6ZDn"
   },
   "outputs": [],
   "source": [
    "# Normalize the pixel values of the train and test images\n",
    "training_images  = training_images / 255.0\n",
    "test_images = test_images / 255.0"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "3DkO0As46lRn"
   },
   "source": [
    "Now you might be wondering why the dataset is split into two: training and testing? Remember we spoke about this in the intro? The idea is to have 1 set of data for training, and then another set of data that the model hasn't yet seen. This will be used to evaluate how good it would be at classifying values."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "dIn7S9gf62ie"
   },
   "source": [
    "Let's now design the model. There's quite a few new concepts here. But don't worry, you'll get the hang of them. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "7mAyndG3kVlK"
   },
   "outputs": [],
   "source": [
    "# Build the classification model\n",
    "model = tf.keras.models.Sequential([\n",
    "    tf.keras.Input(shape=(28,28)),\n",
    "    tf.keras.layers.Flatten(), \n",
    "    tf.keras.layers.Dense(128, activation=tf.nn.relu), \n",
    "    tf.keras.layers.Dense(10, activation=tf.nn.softmax)\n",
    "])"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "id": "-lUcWaiX7MFj"
   },
   "source": [
    "[Sequential](https://keras.io/api/models/sequential/): That defines a sequence of layers in the neural network.\n",
    "\n",
    "[Flatten](https://keras.io/api/layers/reshaping_layers/flatten/): Remember earlier where our images were a 28x28 pixel matrix when you printed them out? Flatten just takes that square and turns it into a 1-dimensional array.\n",
    "\n",
    "[Dense](https://keras.io/api/layers/core_layers/dense/): Adds a layer of neurons\n",
    "\n",
    "Each layer of neurons needs an [activation function](https://keras.io/api/layers/activations/) to tell them what to do. There are a lot of options, but just use these for now: \n",
    "\n",
    "[ReLU](https://keras.io/api/layers/activations/#relu-function) effectively means:\n",
    "\n",
    "```\n",
    "if x > 0: \n",
    "  return x\n",
    "\n",
    "else: \n",
    "  return 0\n",
    "```\n",
    "\n",
    "In other words, it only passes values greater than 0 to the next layer in the network.\n",
    "\n",
    "[Softmax](https://keras.io/api/layers/activations/#softmax-function) takes a list of values and scales these so the sum of all elements will be equal to 1. When applied to model outputs, you can think of the scaled values as the probability for that class. For example, in your classification model which has 10 units in the output dense layer, having the highest value at `index = 4` means that the model is most confident that the input clothing image is a coat. If it is at index = 5, then it is a sandal, and so forth. See the short code block below which demonstrates these concepts. You can also watch this [lecture](https://www.youtube.com/watch?v=LLux1SW--oM&ab_channel=DeepLearningAI) if you want to know more about the Softmax function and how the values are computed.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "Dk1hzzpDoGPI"
   },
   "outputs": [],
   "source": [
    "# Declare sample inputs and convert to a tensor\n",
    "inputs = np.array([[1.0, 3.0, 4.0, 2.0]])\n",
    "inputs = tf.convert_to_tensor(inputs)\n",
    "print(f'input to softmax function: {inputs.numpy()}')\n",
    "\n",
    "# Feed the inputs to a softmax activation function\n",
    "outputs = tf.keras.activations.softmax(inputs)\n",
    "print(f'output of softmax function: {outputs.numpy()}')\n",
    "\n",
    "# Get the sum of all values after the softmax\n",
    "sum = tf.reduce_sum(outputs)\n",
    "print(f'sum of outputs: {sum}')\n",
    "\n",
    "# Get the index with highest value\n",
    "prediction = np.argmax(outputs)\n",
    "print(f'class with highest probability: {prediction}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "c8vbMCqb9Mh6"
   },
   "source": [
    "The next thing to do, now that the model is defined, is to actually build it. You do this by compiling it with an optimizer and loss function as before (notice that a class is passed to the optimizer, while a string is passed to the loss, in TF you will find that there are multiple ways of defining things, for instance \"adam\" would be a valid optimizer, as well as `tf.keras.losses.SparseCategoricalCrossentropy()` will be a valid loss function) -- and then you train it by calling `model.fit()` asking it to fit your training data to your training labels. It will figure out the relationship between the training data and its actual labels so in the future if you have inputs that looks like the training data, then it can predict what the label for that input is."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "BLMdl9aP8nQ0"
   },
   "outputs": [],
   "source": [
    "model.compile(optimizer = tf.optimizers.Adam(),\n",
    "              loss = 'sparse_categorical_crossentropy',\n",
    "              metrics=['accuracy'])\n",
    "\n",
    "model.fit(training_images, training_labels, epochs=5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "-JJMsvSB-1UY"
   },
   "source": [
    "Once it's done training -- you should see an accuracy value at the end of the final epoch. It might look something like `0.9098`. This tells you that your neural network is about 91% accurate in classifying the training data. That is, it figured out a pattern match between the image and the labels that worked 91% of the time. Not great, but not bad considering it was only trained for 5 epochs and done quite quickly.\n",
    "\n",
    "But how would it work with unseen data? That's why we have the test images and labels. We can call [`model.evaluate()`](https://keras.io/api/models/model_training_apis/#evaluate-method) with this test dataset as inputs and it will report back the loss and accuracy of the model. Let's give it a try:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "WzlqsEzX9s5P"
   },
   "outputs": [],
   "source": [
    "# Evaluate the model on unseen data\n",
    "model.evaluate(test_images, test_labels)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "6tki-Aro_Uax"
   },
   "source": [
    "You can expect the accuracy here to be about `0.88` which means it was 88% accurate on the entire test set. As expected, it probably would not do as well with *unseen* data as it did with data it was trained on!  As you go through this course, you'll look at ways to improve this. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "htldZNWcIPSN"
   },
   "source": [
    "# Exploration Exercises\n",
    "\n",
    "To explore further and deepen your understanding, try the exercises below:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "rquQqIx4AaGR"
   },
   "source": [
    "### Exercise 1:\n",
    "For this first exercise run the below code: It creates a set of classifications for each of the test images, and then prints the first entry in the classifications. The output, after you run it is a list of numbers. Why do you think this is, and what do those numbers represent? "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "RyEIki0z_hAD"
   },
   "outputs": [],
   "source": [
    "classifications = model.predict(test_images)\n",
    "\n",
    "print(classifications[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "MdzqbQhRArzm"
   },
   "source": [
    "**Hint:** try running `print(test_labels[0])` -- and you'll get a `9`. Does that help you understand why this list looks the way it does? "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "WnBGOrMiA1n5"
   },
   "outputs": [],
   "source": [
    "print(test_labels[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "uUs7eqr7uSvs"
   },
   "source": [
    "### E1Q1: What does this list represent?\n",
    "\n",
    "\n",
    "1.   It's 10 random meaningless values\n",
    "2.   It's the first 10 classifications that the computer made\n",
    "3.   It's the probability that this item is each of the 10 classes\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "wAbr92RTA67u"
   },
   "source": [
    "<details><summary>Click for Answer</summary>\n",
    "<p>\n",
    "\n",
    "#### Answer: \n",
    "The correct answer is (3)\n",
    "\n",
    "The output of the model is a list of 10 numbers. These numbers are a probability that the value being classified is the corresponding value (https://github.com/zalandoresearch/fashion-mnist#labels), i.e. the first value in the list is the probability that the image is of a '0' (T-shirt/top), the next is a '1' (Trouser) etc. Notice that they are all VERY LOW probabilities.\n",
    "\n",
    "For index 9 (Ankle boot), the probability was in the 90's, i.e. the neural network is telling us that the image is most likely an ankle boot.\n",
    "\n",
    "</p>\n",
    "</details>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "CD4kC6TBu-69"
   },
   "source": [
    "### E1Q2: How do you know that this list tells you that the item is an ankle boot?\n",
    "\n",
    "\n",
    "1.   There's not enough information to answer that question\n",
    "2.   The 10th element on the list is the biggest, and the ankle boot is labelled 9\n",
    "2.   The ankle boot is label 9, and there are 0->9 elements in the list\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "I-haLncrva5L"
   },
   "source": [
    "<details><summary>Click for Answer</summary>\n",
    "<p>\n",
    "\n",
    "#### Answer\n",
    "The correct answer is (2). Both the list and the labels are 0 based, so the ankle boot having label 9 means that it is the 10th of the 10 classes. The list having the 10th element being the highest value means that the Neural Network has predicted that the item it is classifying is most likely an ankle boot\n",
    "\n",
    "</p>\n",
    "</details>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "OgQSIfDSOWv6"
   },
   "source": [
    "### Exercise 2: \n",
    "Now look at the layers in your model. Experiment with different values for the dense layer with 512 neurons. What different results do you get for loss, training time etc? Why do you think that's the case? \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "GSZSwV5UObQP"
   },
   "outputs": [],
   "source": [
    "fmnist = tf.keras.datasets.fashion_mnist\n",
    "\n",
    "(training_images, training_labels), (test_images, test_labels) = fmnist.load_data()\n",
    "\n",
    "training_images = training_images / 255.0\n",
    "test_images = test_images / 255.0\n",
    "\n",
    "model = tf.keras.models.Sequential([\n",
    "    tf.keras.Input(shape=(28,28)),\n",
    "    tf.keras.layers.Flatten(),\n",
    "    tf.keras.layers.Dense(512, activation=tf.nn.relu), # Try experimenting with this layer\n",
    "    tf.keras.layers.Dense(10, activation=tf.nn.softmax)\n",
    "])\n",
    "\n",
    "model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])\n",
    "\n",
    "print(\"Training:\\n\")\n",
    "model.fit(training_images, training_labels, epochs=5)\n",
    "\n",
    "print(\"\\nEvaluating on test set:\\n\")\n",
    "model.evaluate(test_images, test_labels)\n",
    "\n",
    "print(\"\\nPredicting using test set:\\n\")\n",
    "predictions = model.predict(test_images)\n",
    "\n",
    "print(f\"\\nTrue class for first image on test set: {test_labels[0]}\\nProbability of each class:\\n{predictions[0]}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "bOOEnHZFv5cS"
   },
   "source": [
    "### E2Q1: Increase to 1024 Neurons -- What's the impact?\n",
    "\n",
    "1. Training takes longer, but is more accurate\n",
    "2. Training takes longer, but no impact on accuracy\n",
    "3. Training takes the same time, but is more accurate\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "U73MUP2lwrI2"
   },
   "source": [
    "<details><summary>Click for Answer</summary>\n",
    "<p>\n",
    "\n",
    "#### Answer\n",
    "The correct answer is (1) by adding more Neurons we have to do more calculations, slowing down the process, but in this case they have a good impact -- we do get more accurate. That doesn't mean it's always a case of 'more is better', you can hit the law of diminishing returns very quickly!\n",
    "\n",
    "</p>\n",
    "</details>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "WtWxK16hQxLN"
   },
   "source": [
    "### Exercise 3: \n",
    "\n",
    "### E3Q1: What would happen if you remove the Flatten() layer. Why do you think that's the case? \n",
    "\n",
    "<details><summary>Click for Answer</summary>\n",
    "<p>\n",
    "\n",
    "#### Answer\n",
    "You get an error about the shape of the data. It may seem vague right now, but it reinforces the rule of thumb that the first layer in your network should be the same shape as your data. Right now our data is 28x28 images, and 28 layers of 28 neurons would be infeasible, so it makes more sense to 'flatten' that 28,28 into a 784x1. Instead of writing all the code to handle that ourselves, we add the Flatten() layer at the begining, and when the arrays are loaded into the model later, they'll automatically be flattened for us.\n",
    "\n",
    "</p>\n",
    "</details>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "ExNxCwhcQ18S"
   },
   "outputs": [],
   "source": [
    "fmnist = tf.keras.datasets.fashion_mnist\n",
    "\n",
    "(training_images, training_labels), (test_images, test_labels) = fmnist.load_data()\n",
    "\n",
    "training_images = training_images / 255.0\n",
    "test_images = test_images / 255.0\n",
    "\n",
    "model = tf.keras.models.Sequential([\n",
    "    tf.keras.Input(shape=(28,28)),\n",
    "    tf.keras.layers.Flatten(),\n",
    "    tf.keras.layers.Dense(64, activation=tf.nn.relu), # Try experimenting with this layer\n",
    "    tf.keras.layers.Dense(10, activation=tf.nn.softmax)\n",
    "])\n",
    "\n",
    "model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])\n",
    "\n",
    "print(\"Training:\\n\")\n",
    "model.fit(training_images, training_labels, epochs=5)\n",
    "\n",
    "print(\"\\nEvaluating on test set:\\n\")\n",
    "model.evaluate(test_images, test_labels)\n",
    "\n",
    "print(\"\\nPredicting using test set:\\n\")\n",
    "predictions = model.predict(test_images)\n",
    "\n",
    "print(f\"\\nTrue class for first image on test set: {test_labels[0]}\\nProbability of each class:\\n{predictions[0]}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "VqoCR-ieSGDg"
   },
   "source": [
    "### Exercise 4: \n",
    "\n",
    "Consider the final (output) layers. Why are there 10 of them? What would happen if you had a different amount than 10? For example, try training the network with 5.\n",
    "\n",
    "<details><summary>Click for Answer</summary>\n",
    "<p>\n",
    "\n",
    "#### Answer\n",
    "You get an error as soon as it finds an unexpected value. Another rule of thumb -- the number of neurons in the last layer should match the number of classes you are classifying for. In this case it's the digits 0-9, so there are 10 of them, hence you should have 10 neurons in your final layer.\n",
    "\n",
    "</p>\n",
    "</details>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "MMckVntcSPvo"
   },
   "outputs": [],
   "source": [
    "fmnist = tf.keras.datasets.fashion_mnist\n",
    "\n",
    "(training_images, training_labels), (test_images, test_labels) = fmnist.load_data()\n",
    "\n",
    "training_images = training_images / 255.0\n",
    "test_images = test_images / 255.0\n",
    "\n",
    "model = tf.keras.models.Sequential([\n",
    "    tf.keras.Input(shape=(28,28)),\n",
    "    tf.keras.layers.Flatten(),\n",
    "    tf.keras.layers.Dense(64, activation=tf.nn.relu), # Try experimenting with this layer\n",
    "    tf.keras.layers.Dense(10, activation=tf.nn.softmax)\n",
    "])\n",
    "\n",
    "model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])\n",
    "\n",
    "print(\"Training:\\n\")\n",
    "model.fit(training_images, training_labels, epochs=5)\n",
    "\n",
    "print(\"\\nEvaluating on test set:\\n\")\n",
    "model.evaluate(test_images, test_labels)\n",
    "\n",
    "print(\"\\nPredicting using test set:\\n\")\n",
    "predictions = model.predict(test_images)\n",
    "\n",
    "print(f\"\\nTrue class for first image on test set: {test_labels[0]}\\nProbability of each class:\\n{predictions[0]}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "-0lF5MuvSuZF"
   },
   "source": [
    "### Exercise 5: \n",
    "\n",
    "Consider the effects of additional layers in the network. What will happen if you add another layer between the one with 512 and the final layer with 10. \n",
    "\n",
    "<details><summary>Click for Answer</summary>\n",
    "<p>\n",
    "\n",
    "#### Answer \n",
    "There isn't a significant impact -- because this is relatively simple data. For far more complex data (including color images to be classified as flowers that you'll see in the next lesson), extra layers are often necessary. \n",
    "\n",
    "</p>\n",
    "</details>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "b1YPa6UhS8Es"
   },
   "outputs": [],
   "source": [
    "fmnist = tf.keras.datasets.fashion_mnist\n",
    "\n",
    "(training_images, training_labels), (test_images, test_labels) = fmnist.load_data()\n",
    "\n",
    "training_images = training_images / 255.0\n",
    "test_images = test_images / 255.0\n",
    "\n",
    "model = tf.keras.models.Sequential([\n",
    "    tf.keras.Input(shape=(28, 28)),\n",
    "    tf.keras.layers.Flatten(),\n",
    "    # Add a layer here,\n",
    "    tf.keras.layers.Dense(256, activation=tf.nn.relu),\n",
    "    # Add a layer here\n",
    "])\n",
    "\n",
    "model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])\n",
    "\n",
    "print(\"Training:\\n\")\n",
    "model.fit(training_images, training_labels, epochs=5)\n",
    "\n",
    "print(\"\\nEvaluating on test set:\\n\")\n",
    "model.evaluate(test_images, test_labels)\n",
    "\n",
    "print(\"\\nPredicting using test set:\\n\")\n",
    "predictions = model.predict(test_images)\n",
    "\n",
    "print(f\"\\nTrue class for first image on test set: {test_labels[0]}\\nProbability of each class:\\n{predictions[0]}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Bql9fyaNUSFy"
   },
   "source": [
    "### Exercise 6: \n",
    "\n",
    "### E6Q1: Consider the impact of training for more or less epochs. Why do you think that would be the case? \n",
    "\n",
    "- Try 15 epochs -- you'll probably get a model with a much better loss than the one with 5\n",
    "- Try 30 epochs -- you might see the loss value decrease more slowly, and sometimes increases. You'll also likely see that the results of `model.evaluate()` didn't improve much. It can even be slightly worse.\n",
    "\n",
    "This is a side effect of something called 'overfitting' which you can learn about later and it's something you need to keep an eye out for when training neural networks. There's no point in wasting your time training if you aren't improving your loss, right! :)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "uE3esj9BURQe"
   },
   "outputs": [],
   "source": [
    "fmnist = tf.keras.datasets.fashion_mnist\n",
    "\n",
    "(training_images, training_labels), (test_images, test_labels) = fmnist.load_data()\n",
    "\n",
    "training_images = training_images / 255.0\n",
    "test_images = test_images / 255.0\n",
    "\n",
    "model = tf.keras.models.Sequential([\n",
    "    tf.keras.Input(shape=(28,28)),\n",
    "    tf.keras.layers.Flatten(),\n",
    "    tf.keras.layers.Dense(128, activation=tf.nn.relu), # Try experimenting with this layer\n",
    "    tf.keras.layers.Dense(10, activation=tf.nn.softmax)\n",
    "])\n",
    "\n",
    "model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])\n",
    "\n",
    "print(\"Training:\\n\")\n",
    "model.fit(training_images, training_labels, epochs=5)\n",
    "\n",
    "print(\"\\nEvaluating on test set:\\n\")\n",
    "model.evaluate(test_images, test_labels)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "HS3vVkOgCDGZ"
   },
   "source": [
    "### Exercise 7: \n",
    "\n",
    "Before you trained, you normalized the data, going from values that were 0-255 to values that were 0-1. What would be the impact of removing that? Here's the complete code to give it a try. Why do you think you get different results? "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "JDqNAqrpCNg0"
   },
   "outputs": [],
   "source": [
    "fmnist = tf.keras.datasets.fashion_mnist\n",
    "\n",
    "(training_images, training_labels), (test_images, test_labels) = fmnist.load_data()\n",
    "\n",
    "training_images=training_images / 255.0 # Experiment with removing this line\n",
    "test_images=test_images / 255.0 # Experiment with removing this line\n",
    "\n",
    "model = tf.keras.models.Sequential([\n",
    "    tf.keras.Input(shape=(28,28)),\n",
    "    tf.keras.layers.Flatten(),\n",
    "    tf.keras.layers.Dense(512, activation=tf.nn.relu), # Try experimenting with this layer\n",
    "    tf.keras.layers.Dense(10, activation=tf.nn.softmax)\n",
    "])\n",
    "\n",
    "model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])\n",
    "\n",
    "print(\"Training:\\n\")\n",
    "model.fit(training_images, training_labels, epochs=5)\n",
    "\n",
    "print(\"\\nEvaluating on test set:\\n\")\n",
    "model.evaluate(test_images, test_labels)\n",
    "\n",
    "print(\"\\nPredicting using test set:\\n\")\n",
    "predictions = model.predict(test_images)\n",
    "\n",
    "print(f\"\\nTrue class for first image on test set: {test_labels[0]}\\nProbability of each class:\\n{predictions[0]}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "E7W2PT66ZBHQ"
   },
   "source": [
    "### Exercise 8: \n",
    "\n",
    "Earlier when you trained for extra epochs you had an issue where your loss might change. It might have taken a bit of time for you to wait for the training to do that, and you might have thought 'wouldn't it be nice if I could stop the training when I reach a desired value?' -- i.e. 60% accuracy might be enough for you, and if you reach that after 3 epochs, why sit around waiting for it to finish a lot more epochs....So how would you fix that? Like any other program...you have callbacks! Let's see them in action..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "pkaEHHgqZbYv"
   },
   "outputs": [],
   "source": [
    "class myCallback(tf.keras.callbacks.Callback):\n",
    "    def on_epoch_end(self, epoch, logs={}):\n",
    "        if(logs.get('accuracy') >= 0.6): # Experiment with changing this value\n",
    "            print(\"\\nReached 60% accuracy so cancelling training!\")\n",
    "            self.model.stop_training = True\n",
    "\n",
    "callbacks = myCallback()\n",
    "\n",
    "fmnist = tf.keras.datasets.fashion_mnist\n",
    "(training_images, training_labels), (test_images, test_labels) = fmnist.load_data()\n",
    "\n",
    "training_images=training_images / 255.0\n",
    "test_images=test_images / 255.0\n",
    "\n",
    "model = tf.keras.models.Sequential([\n",
    "    tf.keras.Input(shape=(28,28)),\n",
    "    tf.keras.layers.Flatten(),\n",
    "    tf.keras.layers.Dense(512, activation=tf.nn.relu), # Try experimenting with this layer\n",
    "    tf.keras.layers.Dense(10, activation=tf.nn.softmax)\n",
    "])\n",
    "\n",
    "model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])\n",
    "model.fit(training_images, training_labels, epochs=5, callbacks=[callbacks])"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "collapsed_sections": [],
   "name": "C1_W2_Lab_1_beyond_hello_world.ipynb",
   "private_outputs": true,
   "provenance": [],
   "toc_visible": true
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.6"
  },
  "vscode": {
   "interpreter": {
    "hash": "bc58f1a9918615c43466b117602939cc46a8cba292d69906d63eff60c7bc7f26"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
